Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix IRI "normalization" #128

Merged
merged 2 commits into from
Nov 30, 2014
Merged

Fix IRI "normalization" #128

merged 2 commits into from
Nov 30, 2014

Conversation

ozh
Copy link
Collaborator

@ozh ozh commented Oct 29, 2014

Transport curl fails on http://example.com/#hash (fragment attached to domain directly)

Why it fails

When requesting http://example.com/#hash, the actual URL requested becomes http://example.com#hash (no path set, the trailing slash has gone missing)

fsockopen doesn't mind, but curl dies, just as a manual curl request on http://example.com/#hash works fine but fails on http://example.com#hash, with error Could not resolve host: example.com#hash; Host not found

Where it fails

URL http://example.com/#hash becomes http://example.com#hash during the following flow of events :

  • Request::request( $url, ... ) calls Request::set_default($url, ...)
  • here, if ($options['idn'] !== false) invokes new Requests_IRI($url)
  • Requests_IRI::scheme_normalization() empties the path portion, as dictates the protected $normalization array here

What the PR fixes

The normalization removes unwanted bits, eg http://example.com:80/bleh becomes http://example.com/bleh. Alrighty. But it goes too far.

The PR prevents the normalization to think / is an unneeded path

The PR also makes sure that if we have a host but no path, the default / will be added.
This would probably cause problems on non http(s) requests (not sure what the requirements are for dict: requests, for instance) but for Requests here that's not going to be a problem.

As a result:

  • http://example.com becomes http://example.com/
  • http://example.com:80#bleh becomes http://example.com/#bleh
  • any other kind of valid URL stays the same.

Executive summary

pull_request_sam

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.12%) when pulling d93df96 on ozh:hashproblem into b398534 on rmccue:master.

@ozh
Copy link
Collaborator Author

ozh commented Oct 29, 2014

Yeah of course all those silly tests assuming http://bleh/ should be changed to http://bleh would need an update now :) I can add that if the principle of the PR is OK with you.

@rmccue
Copy link
Collaborator

rmccue commented Nov 22, 2014

You know, I've always hated that it does this, so huge 👍 (and sorry for taking so long to get back to this). I know @gsnedders wrote this code originally for SimplePie, so I'd be curious to know why it works this way.

As far as I know, this also doesn't obey the RFC anyway. RFC 3987 states:

In general, an IRI that uses the generic syntax for authority with an
empty path should be normalized to a path of "/".

Let's rock this.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.12%) when pulling aca2b1e on ozh:hashproblem into b398534 on rmccue:master.

@ozh
Copy link
Collaborator Author

ozh commented Nov 22, 2014

\m/

@ozh
Copy link
Collaborator Author

ozh commented Nov 22, 2014

If you can tag that release it would be neat, I want to use this for YOURLS 1.7.1

@gsnedders
Copy link

I believe I originally always went with the shortest form for the normalization, FWIW. I don't think there was any other logic. Per RFC 2616 (now obsolete) "/" and "" are equivalent paths. Note that RFC 7230 states:

When not being used in absolute form as the request target of an OPTIONS request, an empty path component is equivalent to an absolute path of "/", so the normal form is to provide a path of "/" instead.

So, yeah, normalizing to "/" is probably the right thing to do.

Note also Anne's URL spec, which is probably more useful in the real world than RFC 3987 in many ways. Maybe not in ways relevant to Requests, though?

@ozh
Copy link
Collaborator Author

ozh commented Nov 29, 2014

Ping @rmccue! Merging?

@rmccue
Copy link
Collaborator

rmccue commented Nov 30, 2014

@ozh Thanks a bunch!

rmccue added a commit that referenced this pull request Nov 30, 2014
@rmccue rmccue merged commit 3244985 into WordPress:master Nov 30, 2014
@rmccue rmccue added this to the 1.7 milestone Nov 30, 2014
@ozh ozh deleted the hashproblem branch April 17, 2015 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants